skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Liu, Zhun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Transfer learning using ImageNet pre-trained models has been the de facto approach in a wide range of computer vision tasks. However, fine-tuning still requires task-specific training data. In this paper, we propose N3 (Neural Networks from Natural Language) - a new paradigm of synthesizing task-specific neural networks from language descriptions and a generic pre-trained model. N3 leverages language descriptions to generate parameter adaptations as well as a new task-specific classification layer for a pre-trained neural network, effectively “fine-tuning” the network for a new task using only language descriptions as input. To the best of our knowledge, N3 is the first method to synthesize entire neural networks from natural language. Experimental results show that N3 can out-perform previous natural-language based zero-shot learning methods across 4 different zero-shot image classification benchmarks. We also demonstrate a simple method to help identify keywords in language descriptions leveraged by N3 when synthesizing model parameters. 
    more » « less
  2. The Duchenne smile hypothesis is that smiles that include eye constriction (AU6) are the product of genuine positive emotion, whereas smiles that do not are either falsified or related to negative emotion. This hypothesis has become very influential and is often used in scientific and applied settings to justify the inference that a smile is either true or false. However, empirical support for this hypothesis has been equivocal and some researchers have proposed that, rather than being a reliable indicator of positive emotion, AU6 may just be an artifact produced by intense smiles. Initial support for this proposal has been found when comparing smiles related to genuine and feigned positive emotion; however, it has not yet been examined when comparing smiles related to genuine positive and negative emotion. The current study addressed this gap in the literature by examining spontaneous smiles from 136 participants during the elicitation of amusement, embarrassment, fear, and pain (from the BP4D+ dataset). Bayesian multilevel regression models were used to quantify the associations between AU6 and self-reported amusement while controlling for smile intensity. Models were estimated to infer amusement from AU6 and to explain the intensity of AU6 using amusement. In both cases, controlling for smile intensity substantially reduced the hypothesized association, whereas the effect of smile intensity itself was quite large and reliable. These results provide further evidence that the Duchenne smile is likely an artifact of smile intensity rather than a reliable and unique indicator of genuine positive emotion. 
    more » « less
  3. Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations. 
    more » « less